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Abstract 

The power spectrum estimated from a galaxy redshift survey is the 
real spectrum convolved with a window function, so when estimating 
the power on very large scales (for small k), it is important that this 
window function be as narrow as possible. A method that achieves 
this is presented. The optimal fluctuation estimate is found to be the 
Fourier transform of the number density fluctuations n/n—l weighted 
by a function ipo , and the optimal ipo is found to be the ground-state so- 
lution of the Schrodinger equation, with the inverse selection function 
as the potential. This quantum mechanics analogy occurs basically be- 
cause we want the weight function to be narrow both in Fourier space 
(to give a narrow window) and in real space (to minimize the variance 
from shot noise). An optimal method for averaging the estimates at 
different k is also presented, generalizing the standard procedure of av- 
eraging over shells in k-space. Finally, a discrete version of the method 
is presented, dividing space into "fuzzy pixels", which has the advan- 
tage of being able to handle redshift distortions in a straightforward 
way. 



1 Published in ApJ, 455, 429-438, December 20, 1995. Submitted February 3, 1995. 



1 Introduction 



Our observational knowledge of the large-scale structure of the universe is 
growing at a formidable rate. Fifteen years ago, redshifts had been measured 
for a few thousand galaxies. Today the figure is around 10 5 , and ongoing 
projects such as the AAT 2dF Survey and the Sloan Digital Sky Survey 
will raise it to 10 6 within a few years. It is thus very timely to develop 
analysis techniques that allow us to make the most of this data. One of 
the most important cosmological quantities that we wish to extract from 
redshift surveys is the power spectrum P(k). Although this function has 
been estimated from a wide variety of galaxy samples, selected in optical 
(Baumgart k Fry 1991; Park et al. 1992; Vogeley et al. 1992; Park et al. 
1994, Vogeley 1994 - hereafter V94), infra-red (Fisher et al. 1993 - hereafter 
F93; Feldman, Kaiser & Peacock 1994 - hereafter FKP) and radio (Peacock 
& Nicholson 1991) frequencies, there is still some disagreement about both 
its overall normalization and its behavior on very large scales. The former 
problem is related to the issue of biased galaxy formation - see Peacock 
& Dodds (1994) and references therein. Sorting out the behavior on very 
large scales, however, is more an issue of survey geometry and data analysis. 
FKP find evidence for P(k) turning over, whereas Park et al. (1992) claim 
a continued rise up to 200/i _1 Mpc. As pointed out by many authors (e.g. 
F93, FKP, V94, Efstathiou 1994), the P(k) we estimate is the true power 
spectrum convolved with some window function, and the differences between 
the window functions of various workers may be part of the explanation for 
the discrepancies. 

The window functions depend on both the survey geometry and the 
analysis technique employed. Although we may wish we had a method 
where the window functions were delta functions, it is easy to see that this 
is impossible given a finite survey volume. Given this constraint, it is natural 
to ask which analysis technique gives the narrowest window functions. The 
purpose of the present paper is to answer this question. 

Before embarking on this program, it is instructive to compare this ap- 
proach with that of two other recently published methods. Both the method 
of FKP and the Karhunen-Loeve eigenmode method of V94 are optimal in 
the sense of maximizing the signal-to-noise ratio within a certain class of 
techniques, without specifically paying attention to the width of the win- 
dow functions 2 . This corresponds to minimizing the vertical error bars that 

2 The latter method nonetheless tends tend to sharpen the window functions, although 
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are conventionally placed on power spectrum estimates. To indicate the 
width of the window functions, i.e., the extent of spectral blurring, one can 
also place horizontal error bars on the estimates. This has recently become 
fashionable when plotting estimates of the cosmic microwave background 
power spectrum from various experiments, and would be welcome also in 
the context of galaxy surveys. The key point to remember is that if we are 
concerned about both vertical and horizontal error bars, then there is no 
one optimal method. The reason is that there is a trade-off between the two. 
As pointed out in e.g. F93, if one places more weight on the most distant 
galaxies in a flux-limited survey, then the horizontal error bars get narrower 
but the vertical ones grow, because of increased shot noise contributions. A 
method is clearly superior to another if it produces smaller error bars both 
horizontally and vertically. We will find a method that is unbeatable in this 
sense, with a free parameter 7 that allows the desired ratio of the two error 
bars to be specified by the user. 

It should be emphasized that the concept of window functions and "hor- 
izontal error bars" (under various names) and the advantage of them being 
narrow has been extensively discussed in the prior literature (e.g. FKP, 
F93, V94). V94 even provides explicit plots of the window functions for 
some specific examples. Thus what is new in the present paper is mainly 
the derivation of a method that explicitly optimizes the window functions. 

This paper is organized as follows. In Section 2, we derive the rank one 
estimate of -P(k) that is optimal in the above sense. In Section 3, we give 
examples for a number of common survey configurations, and attempt to 
provide some physical intuition for how the geometry affects our ability to 
estimate P(k). In Section 4, we generalize the traditional averaging over 
shells in k-space by deriving the optimal way to combine the estimates for 
different k-modes. In Section 5, we present a pixelized version of the treat- 
ment, which has the advantage of readily being able to handle additional 
elements of realism such as redshift-space distortions. The conclusions are 
summarized in Section 6, and some mathematical details are given in the 
Appendix. 

not optimally, as a kind of side effect, indicating that narrow windows are "natural" in 
some information-theoretic sense. 
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2 The optimal weight function 



Just as FKP, we model the observed galaxy distribution as a 3D Poisson 
process ra(r) = J2i H r ~ r i) with intensity A(r) = ra(r)[l+5 r (r)]. The function 
n is the selection function of the galaxy survey under consideration. As is 
customary, we model the density fluctuations S r as a Gaussian random field 
with power spectrum -P(k) = P(k). We estimate the fluctuation amplitude 
at wave vector k* by 3 

F(k*) = J tp(r)n(r)d 3 r (1) 

and wish to find the optimal choice of the weight function (p. (In Section 
4, we will see that this Ansatz is essentially unbeatable, since all quadratic 
power estimators lead back to this one.) In the Appendix, we show that 

<F(k*)> = ^(0), (2) 
<|F(k*)| 2 > = |^(0)| 2 + ||^(k)| 2 P(k)^ + ||^(r)| 2 -iyrf 3 r. (3) 

This corresponds to equation (2.1.6) of FKP. Hats denote Fourier trans- 
forms, and we have defined tp(r) = n(r)<^(r). From here on, we will find it 
convenient to use the standard Dirac quantum mechanics notation with kets 
and bras, where as usual the wave number operator k = — iV. Defining our 
raw estimate of the power at wave vector k as 

Q(k)^|F(k)| 2 -|^(0)| 2 , (4) 

we can write equation (3) simply as (Q(k)) = (ip\P(k) + V(r)\ip), where 
we have defined the "potential" function V(r) = l/n(r). Thus our power 
estimate is a sum of two terms: a weighted average of the cosmological 
power -P(k), the weights being the window function |t/>(k)| 2 , and a contri- 
bution from shot noise. Since the integral of the window function should be 
unity, the correct normalization of ip is evidently (ip\ip) = 1. Subtract- 
ing the shot noise contribution, we obtain the unbiased power estimate 
Q(k) — (ip\V(r)\ip). In the approximation of FKP that -F(k) is Gaussian, 

including a random mock survey as in equation (2.1.3) in FKP can never give minimal 
error bars, since inclusion of random numbers will always increase the variance of the 
estimator. 
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the variance of this estimate is simply 4 (|F(k)| 4 ) - (|F(k)| 2 ) 2 = (Q(k)) 2 . In 
other words, we minimize the vertical error bars on our power estimate by 
minimizing (Q(k)). Since for any reasonably narrow window function, the 
cosmic contribution (ip\P(k)\ip) ~ P(k*), independent of ip, this means that 
we simply want to minimize the shot noise contribution (ip\V(r)\ip) . 

Let us summarize our results so far: a good ip should be narrow both 
in Fourier space (to give a narrow window function, i.e., small horizontal 
error bars) and in real space (to minimize the impact of shot noise and thus 
give small vertical error bars). In other words, we arrive at the following 
optimization problem: minimize (ip\p(\i) -\-V(r)\tp) subject to the constraint 
that (ip\ip) = 1. Here p is some penalty function that we choose to be small 
near the k* which we wish to estimate and large elsewhere. Introducing a 
Lagrange multiplier E, this leads to the eigenvalue equation 

\p(k) + V(r)]W) = EW). (5) 

Throughout this paper, we will estimate the power at wave vector k* by 
chosing the quadratic penalty function p(k) = |k — h*\ 2 /2j, i.e., try to 
make the window function narrow in the least squares sense. By direct 
substitution, it is readily seen that equation (5) has the solution 

^(r) = Vo(r)e- 8k *- r , (6) 
where ipo is the solution to the Schrodinger equation 

1>o = Eip (7) 

with the smallest eigenvalue E, the ground state. Continuing our quantum 
analogy, we see that we arrived at the quantum ground state because we 
wanted the normalized ipo that minimized the total "energy", where the 
"kinetic energy" (-00 1 k 2 1 V^o) /2 corresponded to the horizontal error bar and 
the "potential energy" y(ipo\V(r)\ipo) corresponded to the vertical error bar. 
In conclusion, we see that we can rewrite equation (1) as 

F(k*)^ f e- ik '- r M^)^jd 3 r. (8) 

4 This follows from the fact that the real and imaginary parts have the same variance, 
which in turn follows from the assumption of random phases. A real-valued -F(k) would 
gives twice this value. Since a translation in real space corresponds to a phase modulation 
in Fourier space, the random phases assumption follows directly from assuming that the 
statistical properties of the galaxy distribution are translationally invariant, an assumption 
that of course underlies the entire power spectrum formalism. 



--V 2 + 7 y(r) 
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In other words, although there was no mention of Fourier transforms is 
our Ansatz (1), our optimal power estimator F(k*) is simply the Fourier 
transform of the field F(r) = tpo(r)^j. Since n is just a sum of delta 
functions, this reduces to 

F(k*) = £ e -^|M, (9) 
where the sum is to be taken over all galaxies observed in the survey. 

3 Examples 

In this section, we give the optimal weight function ipo for a few selected 
survey geometries, and discuss some of their properties. A general feature 
of all solutions is seen to be that they fall off smoothly near the survey 
boundaries, and tend to assign very low weights to points near corners and 
sharp edges. Basically, this is because it helps a lot in Fourier space without 
costing much in real space. 

3.1 Volume-limited surveys 

For a volume-limited survey, n(r) equals some constant when r is in the 
survey volume, and vanishes otherwise. This corresponds to the potential 
V(r) being infinite outside the survey volume, so we can without loss of 
generality set V(r) = inside 5 and obtain the quantum-mechanical particle- 
in-the-box problem: ipo must satisfy the Helmholtz equation [V 2 -\-2E]tp = 
inside the region and vanish on its boundary. For a volume-limited all sky 
survey complete out to a radius R, we obtain 

^o(r)ocjo(^— J, (10) 

where jo(x) = (sin a;) /a;. If we approximate the volume in a pencil-beam 
survey by a cylinder as in Kaiser & Peacock (1991), with radius R, length 
L and cylindrical coordinates p = [x 2 + y 2 ] 1 ^ 2 < R, < z < L, we obtain 

tpo(r) oc sm(7rz/L)J (k p/R), (11) 

5 Continuing our quantum analogy, we get V(r) = by subtracting the constant energy 
j/n from the Hamiltonian in brackets in equation (7). As we know, such a change in the 
(arbitrary) zero point makes no physical difference: all eigenvalues E will simply decrease 
by j/n, and the eigenfunctions will remain the same. 
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where ko ~ 2.405 and Jo is the zeroth Bessel function. A more realistic pencil 
beam survey contains a conical volume of opening angle 29\, in spherical 
coordinates defined by 9 < 6\, r < R. Both this case and that of a slice 
9q < 9 < 0i, tpo < ip < (pi, r < R are treated in Pockels (1891). The 
solutions are a spherical Bessel function of non-integer order in the radial 
direction, an a non-integer spherical harmonic for the angular part. We will 
return to these solutions in subsection 3.3 below. 

3.2 Flux-limited and diameter-limited surveys 

For a flux-limited survey, the selection function n(r) has the radial depen- 
dence (e.g. Peebles 1993, p214) 



where (f> is the galaxy luminosity function and / the flux limit of the survey. 
The Schechter luminosity function (f>(L) oc L a e- L / L *, a = -1.07 ± 0.05, 
gives n(r) = T(l + a, Air fr 2 / L*), an incomplete Gamma function. In the 
crude approximation <f>(L) ~ L~ 2 , we obtain n(r) oc r~ 2 . Thus V(r) oc r 2 
for an all-sky survey, and the Schrodinger equation (7) becomes that of the 
harmonic oscillator, whose familiar ground state is 



in equation (7). Thus we see that the parameter 7, which specifies how 
concerned we are about shot noise relative to our desire to keep the window 
function narrow, determines the effective depth of the survey. A greater 
depth of course narrows the window function, at the price of more shot 
noise. For the Schechter model, V(r) grows exponentially for large r, which 
causes the resulting ipQ to fall off even faster than a Gaussian at large radii 6 . 
Here the effect of changing 7 is found to be much smaller, as this exponential 
cutoff defines a survey depth rather naturally. Notice that in the extreme 
case of volume-limited surveys discussed above, where the survey depth was 
completely fixed, the dependence of ipo on 7 vanished entirely. 

For diameter-limited surveys, the calculation of n is completely analo- 
gous, with the angular distribution function replacing the luminosity func- 
tion in equation (12). 

6 It is easy to see that l^ol 2 will indeed always fall off faster than l/r 3 V(r), for otherwise 
the "potential energy" (?/>|\ / (r)|?/>}, which is the expected shot noise contribution, would 
be infinite. 




(12) 



a Gaussian. Here R oc 7 
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3.3 Combinations: realistic pencil beams, slices, etc. 

Most realistic surveys are volume-limited in the angular directions (due to 
incomplete sky coverage) and flux-limited or diameter-limited in the radial 
direction. They thus result in a Schrodinger equation with a purely radial 
potential, supplemented with the boundary conditions that ip must vanish on 
the survey boundaries, as discussed in subsection 3.1 above. This problem is 
readily attacked by separation of variables in spherical coordinates, writing 
the weight function as 



(14) 



Let us restrict our attention to the following two common cases: a pencil 
beam where n = outside the cone 9 < 9\ and a A9 X Atp slice where 
fi = outside the region \<p\ < Atp/2, \9 — 7r/2| < A9/2. (A slice which is 
not centered on the equator can of course be put in this form by rotating 
the coordinate system.) The ground state solutions for and $ are the 
non-integer spherical harmonics given by Pockels (1891). When the survey 
subtends merely a small angle in the sky (<C one radian), these angular 
solutions reduce to 

eowv) « Jo ( k -f) (is) 



for the pencil beam case and 



e(9)$(<p) « cos -I- 



( 7TLp\ 
COS — COS 

\A^J 



7T 

A9 



( Try \ ( ttz \ 



(16) 



for the case of the slice, which is readily shown by direct substitution into 
the standard expression for V 2 in spherical coordinates. The corresponding 
angular eigenvalues, which we denote Eq,, are Eq, = (fco/#i) 2 for the pencil 
beam and E^ = {it / A9) 2 + (tt/Alp) 2 for the slice. (k « 2.405 as before.) If 
the exact results of Pockels are used for the angular part, these eigenvalues 
will of course be slightly different. 

Inserting this into equation (7), we obtain the following ordinary differ- 
ential equation for the radial function R(r): 



(rR)' 



^ + 2 7 V(r 



2E 



rR. 



(17) 



Assuming that V(r) remains finite at r = 0, the first term on the right hand 
side will dominate near the origin, giving a power-law solution 



R(r) 



:i8) 
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for small r. Substituting this into equation (17) and requiring ip to remain 
finite at the origin (a > 0), we obtain 

VI + 4E a -l 
a = . (19) 

Equation (17) is now readily solved numerically by the "shooting" method, 
using equation (18) as initial data and integrating the differential equation 
for various choices of the constant E. The correct ground-state eigenvalue 
E is of course the one for which the resulting solution R(r) approaches zero 
as r —7- oo and has no zero-crossings — see e.g. Hajj et al. 1974 for details. 

If a good approximation is considered satisfactory, the standard varia- 
tional method (making an Ansatz with a few free parameters and minimizing 
{ip\H \ip) / (ip\ip)) is a recommended alternative approach, as all one needs is 
the ground state. 

Regardless of the exact form of the radial selection function, we can draw 
a general conclusion from equations (18) and (19): ip shuns sharp corners. 
For instance, for a slice with A9 = 20° and A(p = 30°, we get a ~ 10, so 
that ip °S r w for small r, assigning almost no weight at all to the galaxies 
closest to us, in the sharp corner of the slice. A pencil beam with an opening 
angle of 2°, i.e., with 9\ = 1°, is even more extreme, giving a ~ 137. 

Note that for several identical, non-intersecting volumes (such as pencil 
beams), the ground state is degenerate (just as in the corresponding quan- 
tum problem). Thus there will be one solution ip\ which is nonzero only in 
the first "pencil" , another equally good solution ip2 which is nonzero only in 
the second "pencil" and is just a translated and rotated version of tpi, etc. 
These solutions give E just as low as a combination like (ip\ + 1P2) /V%- This 
leads to the perhaps surprising conclusion that two pencil beams cannot 
give narrower window functions than one 7 . Notwithstanding, it is of course 
worthwhile to make several independent pencil beam surveys, since averag- 
ing different modes as in Section 4 helps reduce both the variance from shot 
noise and the sample variance. 



3.4 The k-space picture 

Figure 1 is an attempt to clarify what is going on in k-space. The funnel- 
shaped surface is the Cold Dark Matter (CDM) power spectrum -P(k) = 

7 This result is not at odds with the "cross-correlation" argument described in V94. 
The latter combines several different modes, as is done in the following section, but in 
such a way that the power estimate is not positive definite. Thus the "cross-correlation" 
window function W(k) can take negative values, which renders its usefulness unclear. 
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P{^Jk 2 x + k 2 y + k 2 z ) of Bond and Efstathiou (1984), with htt = 0.5 and k 

measured in units of (/i _1 Mpc) _1 . If the radial profile looks unfamiliar, 
it is because the axes are linear rather than logarithmic. For clarity, the 
third coordinate has been suppressed in the plot (k z = 0). The two surfaces 
below are examples of window functions. The estimated power is obtained 
by simply multiplying the window function with the "funnel" and integrating 
over all k. We have seen that by changing k* in equation (8), we can slide 
our window function around and center it at a point of our choice in k-space, 
but that its shape will remain unchanged, being simply \ipo\ 2 . This simply 
reflects the well-known fact that the spectral resolution Ak is independent 
of k. Thus the principal difficulty is to estimate the power at very long 
wavelengths, near the origin k = 0, where -P(k) may change dramatically 
on the scale of the width of the window function. To accurately probe the 
largest scales A = 2ir/k 30Mpc, where P(k) oc k, we thus need a window 
function that is so narrow that it "fits inside the funnel" . Although narrower 
window functions can always be achieved with deeper surveys (wider ipo), 
the two window functions plotted qualitatively illustrate two subtle problems 
that can occur even if the survey depth L 1/k. The width (standard 
deviation) of ipo determines the scale on which the central peak of ipo falls 
off. Since the example on the left has a very narrow central peak, we see that 
it indeed corresponds to a very deep survey. Unfortunately, the "ringing" in 
Fourier space (often referred to as sidelobes) spoils things, since even if we 
place the central peak in the middle of the funnel, the dominant contribution 
will come from the sidelobes picking up power from much larger k. Such 
sidelobes are caused by sharp edges in ipo, caused for instance by incomplete 
sky coverage, and it is precisely this problem that the method presented in 
this paper attempts to minimize. 

The second window function example, on the right, has a different prob- 
lem. Although ipo is evidently smooth (as there are no sidelobes), the geom- 
etry is highly anisotropic. This is qualitatively what happens for a pencil 
beam (where \ipo\ 2 will be shaped like a thin disc in k-space) and a thin 
"slice" (where \ipo\ 2 will be a pencil in k-space). Thus even though the win- 
dow function \ipo\ 2 is narrow enough in the most favorable direction (because 
the survey is deep enough in one or two dimensions), it will still not "fit in- 
side the funnel". Rather, the power estimate will be dominated by the ends 
of the window function protruding out into parts of k-space where k and 
P(k) are much larger. This well-known phenomenon has been emphasized 
by e.g. Kaiser & Peacock (1991), who argued that the apparent power excess 
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at 128/i _1 Mpc in a pencil beam survey reported by Broadhurst et al. 1990 
was probably due to such leakage from shorter wavelengths. In summary, 
the ability for a survey to probe the longest wavelengths is limited by its 
weakest link, the dimension in which it is the narrowest. 

Figure 2 gives a quantitative example of sidelobe reduction for the above- 
mentioned case of a volume-limited, all-sky survey of depth R = 300/i _1 Mpc. 
Since the main difficulty is very small k, we plot the most large-scale window 
function attainable, that devised to estimate -P(O). The optimal ipo from 
equation (10) gives the window function 



They have both been correctly normalized, to integrate to unity. They both 
vanish at k = for phase space reasons - -P(O) is of course theoretically un- 
observable. For comparison, two unnormalized power spectra, whose shapes 
one would like to be able to discriminate between, are plotted; the CDM 
spectrum of Figure 1 and the a Baryonic Dark Matter (BDM) spectrum. 
The latter (Hu 1995) has = 0.2, h = 0.8 and a thermal history where the 
universe remained 10% ionized. 

4 The optimal weighting of Fourier modes 

So far, we have discussed how to estimate the power -P(k) at any given 
wave vector k. The galaxy clustering is generally believed to be isotropic, 
which means that P in fact depends only on k = |k|, the magnitude of 
the wave vector. The conventional way to estimate P(k) is to average the 
estimates of -P(k) over a fairly thin spherical shell in k-space of radius k, 
thereby reducing the vertical error bar. We will now examine more general 
weighting schemes. 

4.1 The most general quadratic estimator 

It is easy to see that the most general power estimator whose expectation 
value is linear in -P(k) must be quadratic in the field quantities, i.e., of the 




(20) 



where x = kR, whereas the naive choice ipo constant gives 




(21) 
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form 

Q(k*) = / A(r, r')^^ld\d\' - B(k*) (22) 
J n{r)n{r) 

— estimators with higher-order terms would give quantities like -P(k) 2 in the 
expectation values, and be fairly useless assuming that S r is Gaussian. The 
functions A and B are arbitrary. We lost no generality by inserting n above, 
as these functions could always be absorbed into A. It is straightforward to 
show that 

(Q(k*)> = J W(k)P(k)d 3 k + 6(k*) - B(k*), (23) 

i.e., that the expectation value of the estimator is always just the power 
spectrum convolved with some window function W plus a bias term b from 
shot noise etc. that is independent of P(k). Thus one should simply choose 
B(k*) = b(k*) to get an unbiased estimator. The interesting part is how 
to choose the function A. Since Q(k*) must be real- valued, we can without 
loss of generality choose A to be Hermitean, i.e. A(r, r') = A(r', r)*. This 
means that, viewed as an operator, all its eigenvalues to, are real and it can 
be expanded in a set of orthogonal eigenfunctions as 

A(r,r') = X;^(r)^(r')*. (24) 

i 

Thus equation (22) becomes 

2 

-B(k), (25) 

which we recognize as simply a linear combination of estimates of the type 
studied in Section 2 - see equations (1) and (4). Normalizing the the func- 
tions as (ipi\ipi) = 1, as is Section 2, Q(k*) must indeed be simply a weighted 
average, where J2 w i = 1- To guarantee that the window function W be non- 
negative, we will require to, > 0, i.e. that A be positive definite. Resuming 
the quantum analogy, we can thus think of the most general A as a density 
operator 

A = £> t -|^-)<^|, (26) 

i 

and write the normalization and expectation value equations as simply trA = 
1 and (Q(k*)> = tr {[P(k) + V(t)]A}, respectively. 

Thus hopefully having demystified the notion of the most general power 
estimator slightly, showing that it is really nothing more than a weighted 



Q(k*) = $>i J 



n(r) 
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average of simple estimates estimates like in equation (4), it is clear that 
we want to split our analysis into two steps: first estimate the power at a 
grid of points in k-space, with as narrow window functions as possible, then 
for each A;- value of interest, average these estimates in some clever way to 
reduce the variance. We dealt with the first step in Section 2, so let us now 
take the function ipo as given, and find the optimum way to average the 
different Fourier modes. 

4.2 The optimal weights 

We define the covariance function 

C(k, k') ee <F(k)*F(kO> - <F(k)>*<F(k)>. (27) 

Let us split it into a sum of two contributions, one from cosmic power and 
one from shot noise, as C(k, k') = C c (k,k') + C s (k,k'). In the Appendix, 
we show that (F(k)) = V'o(k) and 

C c (k,k') = J ^ (k" - k')>o(k" - k)P(k")rf 3 A;", (28) 
C s (k,k') = | e -( k '- k )-|^ ( r )|2_l_ rf 3 r _ (29) 

This corresponds to equation (2.2.2) in FKP, with their function S giving 
the shot noise. Since the functions ipQ and \ipo\ 2 /n both have widths of the 
order of the size of the survey volume L, their Fourier transforms tend to 
fall off on a coherence scale Ak ~ 1/L — see FKP. Assuming that -P(k) 
does not vary much when k changes by such a small amount, we can factor 
it out of equation (28), apply the convolution theorem and obtain 

C c (k,k') « P (^~) J e-^'- k ^\Mr)\ 2 d 3 r. (30) 

Note that for volume-limited surveys, where n(r) is constant, C c and C s dif- 
fer merely by a factor nP(k). Also note that for a highly anisotropic survey 
volume, such as a pencil-beam or a slice, the coherence length is correspond- 
ingly anisotropic in k-space, being shortest in the direction corresponding to 
the greatest dimension of the survey volume. In other words, the coherence 
behaves in much the same way as the window function |t/>o(k)| 2 , as in the 
lower right of Figure 1. 
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Suppose we have evaluated the Q(k) of equation (4) at the N different 
k- values {ki, kjy}, and arranged the shot-noise corrected results in an 
TV-dimensional vector q, i.e., 

q t =Q(k t )- f |^ ( r )| 2 -l^ 3 r. (31) 

These points k 8 are preferably chosen in the vicinity of the sphere |k| = k* , 
and it is clearly redundant to use points separated by much less than the 
coherence length Ak, as this adds almost no additional information. 
Let us define another TV-dimensional vector, e, that has all its components 
equal to one, i.e., e 4 - = 1. As our estimate P(k) of the true power P(k), we 
take a weighted average P(k) = w • q, where the weights to, > add up to 
one, i.e., e • w = 1. The expectation value of our estimate is simply 



where 



(P(k)) = J W(k)P(k)d% (32) 

N 

W(k) = £^(k-k,-)| 2 , (33) 



8 = 1 



and its variance is 



(P(k) 2 ) - P(k) 2 = ^w'Cw, (34) 

where the matrix C is given by C\j = 4|C(k 4 -, kj)| 2 . Just as in FKP, we are 
of course forced to make some assumption about the power level P(k) to be 
able to compute the function C and estimate our error bars. 

The variance clearly decreases if we make the distribution to, less concen- 
trated, effectively averaging more independent modes. For an all-sky survey 
with no zones of avoidance, ipo and \J)q will De spherically symmetric, so 
averaging over a shell in k-space will not widen the window function jy(k) 
at all. However, this is not a very common survey geometry, and we may in 
addition wish to do some averaging in the radial direction to further reduce 
the variance, while still striving to keep the window function fairly narrow. 
As in the last section, we attempt to minimize a penalty 

J p(k)W{k)d 3 k = f • w, (35) 

where fi = /(k 4 -) and the function / works out to be the convolution of |t/>| 2 
with p: 

/(k)= [ $(k'-k)\ 2 p(k')d?k'. (36) 
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Note that this time the penalty function p depends only on the magnitude 



We thus arrive at the following optimization problem: 
Minimize w*Cw/2 + f • w subject to the constraints that e • w = 1 and 
to, > 0. This is a quadratic programming problem. However, there is hardly 
much point in delving into complicated numerical methods at this step, as 
we merely want a decent solution and do not care if it is exactly optimal. 
In this spirit, it is quite easy to obtain an approximate solution as follows. 
Introducing a Lagrange multiplier A, we minimize w*Cw/2 + f • w — Ae • w 
and obtain Cw = Ae — f . Eliminating A using e • w = 1, we get 



where e' = C _1 e and f = C _1 f . To get a feeling for what is going on, let us 
study a simple special case. If the points {k 8 } are all separated by several 
coherence lengths, C will be approximately diagonal. If we furthermore 
have chosen all the k 8 to lie near a spherical shell |k| = k* and the survey 
geometry is spherically symmetric, then we can choose units where C = I, 
the identity matrix, and the solution reduces to 



Hence the weights are just a constant minus the window-convolved penalties 
fi. If some fi exceeds the average by more than 1/N, W{ will become neg- 
ative, which is forbidden. If this happens, we clearly get a reasonably good 
result by simply setting all negative weights equal to zero and rescaling. We 
get an even better result if we reduce N by omitting these offending k-values 
and solving again, a procedure which is easy to iterate until all weights are 
positive. This procedure is of course merely a useful numerical trick which 
produces an almost optimal solution with little numerical effort — if the 
exact optimum is desired, then a standard software package for quadratic 
programming should be employed instead. It should be stressed that even 
with an approximate solution for the weights, the resulting power spectrum 
estimate is of course 100% correct — the resulting error bars will merely be 
slightly larger than the optimal ones. 



k = Ikl. 




(37) 




(38) 
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5 The pixelization method 



The problem of estimating P(k) from galaxy surveys has many elements in 
common with that of estimating the angular power spectrum Ci from cos- 
mic microwave background (CMB) sky maps - see Tegmark (1994, hereafter 
T94), Tegmark (1995) and references therein. One obvious difference is that 
CMB maps are pixelized and galaxy surveys are not. In this section, we dis- 
cretize the galaxy survey problem by dividing the sky into N "fuzzy pixels" . 
This allows the calculations of the two preceding sections to be combined 
into a single eigenvalue problem. The main advantage is that the eigenvalue 
problem is no longer one of differential operators but one of finite matrices. 
Thus it can be always be solved numerically even when we add in various 
extra complications, such as for instance redshift distortions, evolution and 
power-spectrum weighting, as outlined at the end of the section. The main 
disadvantage is that since N X N matrices become numerically cumbersome 
to diagonalize if N exceeds a few thousand, we must make our 3D pix- 
els fairly wide and thus sacrifice some spatial resolution. This makes the 
method most suitable for extracting the power at long wavelengths, where 
spatial resolution is not a problem. 

Let us define the overdensity in N "pixels" z\, zn by 

Zi = J ^(r)[n(r)-n(r)]rf 3 r. (39) 

Although the weight functions ipi may be chosen to live only on disjoint 
volumes (sharp pixelization, in which case the pixel values reduce to being 
basically counts in cells) , it is preferable to reduce sidelobes in Fourier space 
by chosing the <pi to be smooth, for instance Gaussians <fi(r) oc e _ l r_r 'l / 2<7 » . 
It is convenient to chose the widths to be slightly greater than the typical 
separation between neighboring points r 4 -, and to place points more sparsely 
in the outskirts of the survey volume. 

We define the matrix- valued functions S*(k) and T(r) by 
Sij(k) = V>,-(k)*V>j(k) and T tJ (r) = ipi(r)*ipj(r), where ^(r) = n(r)^-(r). 
The rest of the analysis now becomes very similar to that in T94, so in the 
interest of brevity, we will simply copy that notation and refer to Section 
III of that paper for explanations. The power estimate 

N' 

D(k*) = Z t Ez = J2»k(e k -z) 2 (40) 
k=i 



15 



has the expectation value 

(D(k*)) = Jw(k)P(k)d 3 k + B, (41) 



where the window function is 



N' 

W(k) = tr {ES(k)} = J2 a k e{S(k')e k (42) 

k=i 



and the bias from shot noise is 

N' 



B = J2»k4 [T(r)V(r)d 3 r 
k=i u 



e k . (43) 



Minimizing J p(k)W(k)d 3 k + B (note that the penalty function p depends 
only on the magnitude k = |k|) subject to the constraint that J W(k)d 3 k = 
1, we obtain the generalized eigenvalue problem 

(Q - \R)e k = 0, (44) 
where we have defined the N X N matrices 

S(k)p(k)d 3 k + J T(r)V(r)d 3 r, (45) 

R = / ^(k)^. (46) 



Since both Q and R are Hermitean (T is in addition real), this generalized 
eigenvalue problem has N orthogonal solutions, which we normalize and 
sort by their eigenvalues (the smallest eigenvalue corresponds to k = 1, the 
best solution, etc.). Most standard eigenvalue packages (such as the public- 
domain package EISPACK, or that included in NAG) provide a specialized 
routine for precisely this problem: the generalized eigenvalue problem where 
Q and R are Hermitean, and R is positive definite. Now the trade-off be- 
tween vertical and horizontal error bars becomes quite transparent. The 
larger we choose N', i.e., the more modes we include (with equal weighs 
a = 1/N', say), the smaller the variance of our power estimate D(k*) and 
the wider the window function W(k) = J W(k)dQ k . Since the modes are 
sorted by decreasing merit, we can for instance plot vertical and horizon- 
tal error bars versus N', the number of modes included, and then fix N' 
according to our preferences, analogously to what is done in a Karhunen- 
Loeve analysis a la V94. 
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5.1 Redshift distortions 



A ubiquitous problem with power spectrum estimation is that of "redshift 
distortions". When estimating the radial distance to a galaxy by its red- 
shift, galaxies receding faster than the Hubble flow due to local gravitational 
interactions appear to be further away than they really are, and vice versa. 
This was first discussed by Kaiser (1987), and a recent review is given by 
Tegmark & Bromley (1995). Denoting the apparent density field S s (r), it is 
straightforward to use Kaiser's formalism to show that in linear perturbation 
theory, 

8 s (k) = 6 r (k) + pJ f(k, kO£ r (k')d 3 A;', (47) 

where (3 = Q os /b, the constant b is the so called bias factor, and the function 
/ is defined by 

/(k, k') = e 8 '( k '- k )- r 

Thus we obtain 

<S s (k)*S s (k')> = J g(k,k',k")P(k")d 3 k", (49) 

where 

fl (k,k',k") = 5(k-k")5(k'-k")+p 2 f(k,k")*f(k',k") + 

+ /3[5(k-k")/(k',k) + 5(k'-k")/(k,k')*]. (50) 

The above expressions are derived and discussed in detail by Zaroubi & 
Hoffman (1995). The key point here is that although no longer diagonal, 
and rather messy, the expression for (S s (k)*S s (k')) is still linear in the power 
spectrum. Thus the expectation value of a quadratic estimator will still be 
some noise term plus a term linear in P(k). In other words, if the treatment 
in the previous sections is repeated with S s in place of S r , all the optimization 
problems will retain the same form, merely with messier looking expressions 
for the window functions — window functions that now probe the power 
directly in real space, not in redshift space. Although the resulting analog of 
equation (5) will generally become too complex to admit analytic solutions, 
the extra complication is of no importance in the case of equation (44) 
above, since the latter is to be solved numerically anyway. All that happens 
is that the matrices to be diagonalized change somewhat. Since all linear 
complications of the problem are numerically "free" , we mention below two 
more elements of realism that are straightforward to take into account. 
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5.2 Evolution 



If smoothed on a scale much greater than the Jeans length, the field of 
density fluctuations 8 r maintains its shape in linear perturbation theory, 
simply increasing in amplitude by a position-independent growth factor D. 
Since we are seeing distant galaxies at an earlier time, we see the apparent 
density fluctuations 

8 a (r) = D(z)8 r (r), (51) 

where the redshift z = H^r/c if z <C 1 and D(z) = 1/(1 + z) for the simple 
case Q = 1. Equation (3) now gets replaced by 

(|F(k*)| 2 ) = |(F(k*))| 2 + / |^(k)| 2 P(k)^ + / Jp^d\ (52) 

where we have redefined ip(r) = D(r)n(r)(p(r). This correction is quite 
small for contemporary galaxy surveys, where n typically varies dramatically 
between z = and z = 0.2, a range over which D changes by at most about 
20%, less for small S7. 

This fluctuation evolution should not be confused with galaxy evolution, 
which affects only n and not S r . 

5.3 Power weighting 

Our estimators probe a weighted average of P(k). This means that in regions 
where P(k) is concave, i.e., P"(k) > 0, we get an overestimate, and vice 
versa. In general, averaging with a window function causes the least harm 
when applied to a function that is fairly constant, since then "leakage" from 
the sides has little impact. If we have certain preconceptions about the 
behavior of P, believing it to have roughly the form P*(k), say, is thus 
better to define a rescaled power spectrum D(k) = P(k)/P*(k) and set 
out to estimate the fairly constant function D(k) instead. This is precisely 
what was done with the CBR spectrum in T94, and we will not repeat 
it here as it is completely analogous. The result is that the Schrodinger 
eigenvalue problem (5) gets replaced by a generalized eigenvalue problem 
(E gets multiplied by an operator) which is harder to solve. The discrete 
eigenvalue problem in equation (44) is already of the generalized type, so 
this is no complication at all. For instance, when going for the power on 
very large scales, one may chose P*(k) = k to make the window functions 
shun high frequencies more than they would otherwise. 
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6 Discussion 



We have presented a method for estimating the 3D power spectrum P(k) 
that maximizes the spectral resolution, i.e., makes the window functions 
in A;-space as narrow as possible given a level of shot-noise variance. The 
method decomposes into two separate steps: 

1. Estimate -P(k) for a 3D grid of k- values with separations of the order 
of the coherence length. 

2. For each A;- value of interest, estimate P(k) by taking a weighted average 
of the estimates of -P(k). 

We found that Step 1 amounted to the the following: 

• Determine the weight function ipo(r) from the survey geometry by find- 
ing the ground state solution of the Schrodinger equation with the in- 
verse selection function as potential. 

• Weight the galaxies by the inverse selection function times ipQ. 

• Fourier transform. (This simply involves a sum over all the observed 
galaxies.) 

• As the power estimate, take the square modulus of the Fourier trans- 
form minus |t/>o(k)| 2 and the k-independent noise bias / ^d 3 r. 

It is noteworthy that whereas the second step does, this first step does not 
involve any assumptions about the power spectrum P(k). 

For Step 2, the standard approach of giving equal weight to all k- values 
on a spherical shell of radius k, and zero weight to all others, gives a decent 
estimate if the survey volume is fairly spherical. In general, the optimal 
weighs are obtained by inverting a certain matrix as described in Section 
4. For pencil-beams and slices, those k-values near the shell that get the 
greatest weight are those that correspond to the widest directions of the 
survey volume, just as one would expect. 

From the discussion at the end Section 3, we conclude that if a fixed 
number of square degrees are available for a redshift survey, then 

• one should not try to make the area oblong, in a naive attempt to 
"probe larger scales", 

• one should not split the area into several disjoint pieces, and 

• the optimal survey shape is a circle, corresponding to a conical 3D 
geometry. 
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In Section 5, we presented a method where the survey volume was di- 
vided into a large number of "fuzzy pixels" . This discretization of the prob- 
lem makes it quite analogous to the analysis of pixelized cosmic microwave 
background (CMB) sky maps, and the solution is of course the same as 
well: the optimal power estimators are eigenvectors of a matrix eigenvalue 
problem, just as for the CMB case treated in T94. Because of the limited 
spatial resolution of the pixels, this method is most suited for estimating 
the power at the longest wavelengths. A major advantage is that it offers 
a way of taking linear regime redshift distortions into account without any 
approximations. 

Although it is hoped that the methods presented in this paper will prove 
useful in estimating the amount of power on the largest scales, it should be 
stressed that wide window functions are by no means the only challenge we 
face in this endeavor. One notorious difficulty is that n is usually not known 
a priori, but estimated from the survey itself, which can gives the impression 
of artificially low large-scale power as discussed in FKP. Another potential 
source of trouble, for flux-limited surveys, is that the bias parameter may 
depend on the luminosity class of galaxies under consideration. Since the 
most remote parts of the survey probe mainly the brightest galaxies, the 
observed 8 r would have a spatial modulation. Although the sign of this 
modulation is probably the opposite of the evolution effect discussed in the 
previous section, the net result would be the same: an overestimate of the 
large-scale power. 

Let us conclude by comparing the results from Section 2 with those of 
FKP. They derive the weight function that minimizes the vertical error bars, 
the variance of the estimate, without specific regard to what happens to the 
window function. With our notation, they find the optimal solution to be 



If n(r)P(k) > 1 in the central regions of the survey, this implies that their 
optimal ipQ is fairly constant there, which agrees with our conclusions: the 
ground state of the Schrodinger equation does not vary much in regions 
far from the survey boundaries. Thus the powerful conclusions that can 
be drawn from the FKP formula about the merits of sparse sampling, for 
instance, remain valid also with the analysis method presented here. Rather, 
the difference between the two methods manifests itself near the edges, where 
our ipQ always approaches zero faster than the ipo of FKP. The latter falls 




(53) 
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off no faster than n does (for instance, it goes abruptly to zero at any sharp 
boundary). This is why it gives a wider window function in k-space. 

The author wishes to thank George Efstathiou, Karl Fisher, Alan Heav- 
ens, John Peacock and Michael Vogeley for useful comments on the manuscript, 
and Wayne Hu and Naoshi Sugiyama for kindly providing unpublished power 
spectra. This work has been partially supported by European Union con- 
tract CHRX-CT93-0120 and Deutsche Forschungsgemeinschaft grant SFB- 
375. 

Appendix 

In this Appendix, we derive equations (2), (3), (28) and (29). 

As mentioned, we model the galaxy distribution as a 3D stochastic point 
process n(r) = J2i H r ~ r i) which is a Poisson process with intensity (average 
point density) A(r). A Poisson process satisfies (see e.g. Appendix A of FKP) 

Mr)) = A(r), (54) 
(ra(r)ra(r')) = A(r)A(r') + 5(r - r')A(r). (55) 

Here A is it self a random field, A(r) = ra(r)[l + 5 r (r)], where the density fluc- 
tuations S r are a Gaussian random field satisfying the standard expressions 
(5 r (r)) = and 8 

(5 r (k)*5 r (k')) = (2yr) 6 5(k - k')P(k). (56) 

There are thus two separate random steps involved in generating n: first 
the generation of S r , then the Poissonian distribution of points. To make 
this distinction clear, will will occasionally use double brackets, where the 
inner bracket denotes expectation values over S r . For instance, ((ra(r))) = 
(A(r)) = ra(r). Given a function tpo, we define F(r) = Lpo(r)n(r) and wish 
to compute the mean and autocorrelation of its Fourier transform 

F(k) = J e- tk - r F(r)d 3 r. (57) 

Inserting the definition of F, we obtain 

«F(k)» = J e - 8k -^ (r)((n(r))rf 3 r = J e^M*) = MV, (58) 

8 We adopt this slightly unconventional 27r-convention in the definition of -P(k) because 
it eliminates the nuisance of repeated occurrences of (27r) 3 throughout the main part of 
the paper. 
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where ipo(r) = n(r)<^o(r) as before, and 

«F(k)*F(k')» = J |e-( k '' r '- k '^ (r)Vo(r')((n(r) Jl (r')}) ( l 3 rrf 3 r'. (59) 

The contribution to ((ra(r)ra(r'))) from the second term in equation (55) 
gives 

f e - 8 Xk'-k).r Mr)| 2- (r)rf 3 r = /■ e -.-(k'-k).r]^Ml! d 3 r) (6Q) 

J J 7i I i* j 

which we identify as the shot noise contribution C s in equation (29). The 
contribution to ((ra(r)ra(r'))) from the first term in equation (55) gives 

J J e- 8 ( k '- r '- k - r Vo(r)Vo(r') [1 + (5 r (r)5 r (r r ))] d 3 rd 3 r'. (61) 

The first of these terms is separable and reduces to simply ((F(k)))*((F(k))). 
Both ipQ and S r are real- valued, so using the Fourier identity 

j e-^MrySr(r)d 3 r = J ^ ( k " - k)*5 r (k")d 3 k" , (62) 

the second term yields 

J y^ o (k w -k0>o(k''-k)(5 r (k'0*5 r (k''0)d 3 A ; ''d 3 A ; w 

= j Mk" - k')> (k" - k)P{k")d 3 k'\ (63) 

which is the cosmic contribution C c in equation (28). This completes the 
derivation of the covariance function C(k,k'). The special case of equa- 
tion (3) is obtained by setting k = k' = 0. 
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Figure 1: Problems in k-space. 
The funnel-shaped surface is a standard CDM power spectrum P(k x , k y , k z ), 
with k z = 0. The window-function on the lower left suffers from wide 
sidelobes due to a sharp edge in the weight function ipQ. The window function 
on the lower right suffers from being highly anisotropic, such as is the case 
for pencil beam and slice surveys. 



24 



OA 




k/2n 

Figure 2: Window functions before and after optimization. 
The two solid lines show window functions for estimating the power at k = 
in a volume limited survey of depth 300/i _1 Mpc. The heavy line corresponds 
to the optimal choice of ipo, whereas the one with all the sidelobes (shaded) 
corresponds to naively chosing ipo constant. Two popular power spectra, 
BDM (dashed) and CDM (dotted) are plotted for comparison. 
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